dangerous behavior
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Cui, Chenhang, Deng, Gelei, Zhang, An, Zheng, Jingnan, Li, Yicong, Gao, Lianli, Zhang, Tianwei, Chua, Tat-Seng
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications. Despite this great success, the safety guardrail of LVLMs may not cover the unforeseen domains introduced by the visual modality. Existing studies primarily focus on eliciting LVLMs to generate harmful responses via carefully crafted image-based jailbreaks designed to bypass alignment defenses. In this study, we reveal that a safe image can be exploited to achieve the same jailbreak consequence when combined with additional safe images and prompts. This stems from two fundamental properties of LVLMs: universal reasoning capabilities and safety snowball effect. Building on these insights, we propose Safety Snowball Agent (SSA), a novel agent-based framework leveraging agents' autonomous and tool-using abilities to jailbreak LVLMs. SSA operates through two principal stages: (1) initial response generation, where tools generate or retrieve jailbreak images based on potential harmful intents, and (2) harmful snowballing, where refined subsequent prompts induce progressively harmful outputs. Our experiments demonstrate that \ours can use nearly any image to induce LVLMs to produce unsafe content, achieving high success jailbreaking rates against the latest LVLMs. Unlike prior works that exploit alignment flaws, \ours leverages the inherent properties of LVLMs, presenting a profound challenge for enforcing safety in generative multimodal systems. Our code is avaliable at \url{https://github.com/gzcch/Safety_Snowball_Agent}.
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
Zhang, Zaibin, Zhang, Yongting, Li, Lijun, Gao, Hongzhi, Wang, Lijun, Lu, Huchuan, Zhao, Feng, Qiao, Yu, Shao, Jing
Multi-agent systems, augmented with Large Language Models (LLMs), demonstrate significant capabilities for collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. From the perspective of agent psychology, we discover that the dark psychological states of agents can lead to severe safety issues. To address these issues, we propose a comprehensive framework grounded in agent psychology. In our framework, we focus on three aspects: identifying how dark personality traits in agents might lead to risky behaviors, designing defense strategies to mitigate these risks, and evaluating the safety of multi-agent systems from both psychological and behavioral perspectives. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' propensity for self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and their dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at https:/github.com/AI4Good24/PsySafe.
Can AI Video Analytics Ever Really Be Intelligent?
Video surveillance is commonly associated with security. But in most cases, it's used to record incidents and assist in investigations after the fact rather than prevent undesirable events. Artificial intelligence–powered video analytics is a highly promising trend that fundamentally changes the way things work. Extracting manageable data from a video stream can help recognize risky situations early on, minimizing damage and, ideally, completely avoid emergencies. At the same time, AI significantly expands the areas of application of video surveillance beyond security systems.
Artificial Intelligence to Make Petrol Pumps Safer by Picking Out Dangerous Behavior
Artificial intelligence (AI), machine learning (ML) and continual deep learning (DL) are the new age digital skills that are being expected to transform the consumer and enterprise experience. Due to the vast amount of data that is now available in the Internet domain, machine learning and deep leaning have the capability to predict and prevent various catastrophically dangerous events. Now, Shell wants to leverage artificial intelligence to make petrol pumps a safer place. Shell has selected C3 IoT and Microsoft Azure to power a new companywide AI platform. A device inside petrol pumps running the Microsoft Azure IoT Edge can use artificial intelligence tools to pick out dangerous behavior like people lighting cigarettes while waiting at the pump, people driving recklessly, theft, and improper fueling.
Twitter found to block certain words in search engine
Twitter has quietly started blocking certain words on the platform's built-in search engine. Words such as'porn', 'nsfw', 'sex' and similar terms will no longer appear when searched under'Latest' tab – but, racial slurs and the word'jihad' have not been removed. Although Twitter has blocked these words from being found in the Latest tab, users can still find some of the'forbidden' terms by searching in the'Top' tab. Twitter has quietly started blocking certain words on the platform's built-in search engine. Words such as'porn', 'nsfw', 'sex' and similar terms will no longer appear when searched under'Latest' tab Twitter says it'prohibits the promotion of hate content, sensitive topics, and violence globally.' But this policy does not apply to news and information that calls attention to hate, sensitive topics, or violence, but does not advocate for it.